A Generic Model to Compose Vision Modules for Holistic Scene Understanding
نویسندگان
چکیده
The problem of holistic scene understanding involves many vision tasks such as depth estimation, scene categorization, event categorization, etc. Each of these tasks explores some aspects of the scene but, these tasks are related in that, they represent attributes of the same scene. An intuition is that one task can provide meaningful attributes to aid the learning process of another task. In this work, we propose a generic model (together with learning and inference techniques) for connecting different vision tasks in the form of a 2-layer cascade. Our model considers the first layer as a hidden layer, where the latent variables are inferred by feedback from the second layer. The feedback mechanism allows the first layer classifiers to focus on more important image modes, and draws their output towards “attributes” rather than the original “labels”. Our model also automatically discovers sparse connections between the learned attributes on the first layer and the target task on the second layer. Note that in our model, the same vision tasks can act as attribute learners as well as target tasks, while being set up on different layers. In extensive experiments, we show that the same proposed model improves the performance in all the tasks we consider: single image depth estimation, scene categorization, saliency detection and event categorization.
منابع مشابه
Combining Plane Estimation with Shape Detection for Holistic Scene Understanding
Structural scene understanding is an interconnected process wherein modules for object detection and supporting structure detection need to co-operate in order to extract cross-correlated information, thereby utilizing the maximum possible information rendered by the scene data. Such an inter-linked framework provides a holistic approach to scene understanding, while obtaining the best possible...
متن کاملHuman-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding
Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers. In this work, we are interested in understanding the roles of these different tasks in improved scene understanding, in particular semantic segme...
متن کاملKnowledge Representation and Inference for Grasp Affordances
Knowledge bases for semantic scene understanding and processing form indispensable components of holistic intelligent computer vision and robotic systems. Specifically, task based grasping requires the use of perception modules that are tied with knowledge representation systems in order to provide optimal solutions. However, most state-of-the-art systems for robotic grasping, such as the KCoPM...
متن کاملFrom holistic scene understanding to semantic visual perception: A vision system for mobile robot
Semantic visual perception for knowledge acquisition plays an important role in human cognition, as well as in the many tasks expected to be performed by a cognitive robot. In this paper, we present a vision system designed for indoor mobile robotic systems. Inspired by recent studies on holistic scene understanding, we generate spatial information in the scene by considering plane estimation a...
متن کاملAutomated Scene Understanding for Airport Aprons
This paper presents a complete visual surveillance system for automatic scene interpretation of airport aprons. The system comprises two main modules — Scene Tracking and Scene Understanding. The Scene Tracking module is responsible for detecting, tracking and classifying the semantic objects within the scene using computer vision. The Scene Understanding module performs high level interpretati...
متن کامل